YouCookII Dataset

نویسندگان

  • Luowei Zhou
  • Chenliang Xu
  • Jason J. Corso
چکیده

Learning from instructional video is a promising direction that may help ground the vision and language problem. To move toward this goal, we collect a largescale cooking video dataset, called YouCookII, with 2000 videos downloaded from YouTube. All the videos are untrimmed, under unconstrained environment and in third person viewpoint. They represent a more challenging visual problem than existing instructional datasets. The annotations of the videos include the temporal boundaries for procedure steps of each video the corresponding English descriptions for each step. All the frame-wise features and annotations are available for download on the dataset webpage: http://youcook2.eecs.umich.edu.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Automatic Learning of Procedures from Web Instructional Videos

We propose a temporal segmentation and procedure learning model for long untrimmed and unconstrained videos, e.g., videos from YouTube. The proposed model segments a video into segments that constitute a procedure and learns the underlying temporal dependency among the procedure segments. The output procedure segments can be applied for other tasks, such as video description generation or activ...

متن کامل

Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolving ambiguous visual-linguistic cues. Furthermore, dense ...

متن کامل

End-to-End Dense Video Captioning with Masked Transformer

Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevent...

متن کامل

Evaluation of Updating Methods in Building Blocks Dataset

With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...

متن کامل

بررسی هم بستگی و تکرارپذیری آماره های پارامتری و چندمتغیره پایداری عملکرد دانه در جو دیم

Multi-environment trial data are required to obtain stability performance parameters as selection tools for effective cultivar evaluation. The interrelationship among several stability parameters and their associations with mean yield, along with the repeatability of these parameters in consecutive years was the objective of this study. Barley yield data of 18 cultivars, proprietary of Dryland ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017